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SBMUtY 


We  study  optimally  bounded  score,  functions  for  estimating  regression 
parameters  in  a  generalized  linear  model *  Qur  work  extends  results 
obtained  by  Krasker  &  Velsch  (1982)  for  the  linear  model  and  provides  a 
simple  proof  of  Krasker  and  Velsch' a  first  order  condition  for  strong 
optimality*  The  application  of  these  results  to  logistic  regression  is 
studied  in  some  detail  with  an  example  given  comparing  the  bounded  in¬ 
fluence  estimator  with  maximum  likelihood* 

Some  key  vardsi  Bounded  influence;  Generalized  linear  models;  Influential 
points;  Logistic  regression;  Outliers;  Robustness* 

i 


:  ,,ion 


1.  IRBflaDCTIOK 


In  •  generalised  linear  model  (KcCullagh  &  Nelder,  1983*  Ch.  2)  a 
response  variable  T  and  covarlate  vector  X  are  related  via  a  conditional 
density  of  the  form 

f(y|x)  ■  exp{(y-h(xT9))q(x*9)«(x)/a  C.(y,s))  . 

The  functions  h( • )  and  q(*)  are  subject  to  certain  restrictions,  9  la  a 
vector  of  regression  parameters,  o  is  a  scale  parameter  and  «t(x>  la  a  known 
weight  function.  In  this  paper  we  study  the  problem  of  robustly  estimating 
9  when  <a( • )  ■  1  and  a  la  known.  Models  of  this  type  include  logistic  and 
probit  regression,  Poisson  regression,  and  certain  models  used  in  modeling 
lifetime  data.  In  the  case  where  *»(•)  a  1  hut  a  is  unknown  the  methods 
presented  in  Section  2  are  still  applicable  with  some  modification  to  allow 
for  joint  estivation  of  a,  c.f.  Kraaker  &  Welach  (1982). 

Our  motivation  for  seeking  robust  estimators  is  the  same  as  that 
encountered  in  the  context  of  linear  model  —  maximum  likelihood  estimation 
is  very  sensitive  to  outlying  data.  For  the  case  of  logistic  regression, 

i 

Pregibon  (1981,  1982)  has  documented  the  nonrohuatness  of  the  maximum 
likelihood  estimator  and  expounded  the  benefits  of  diagnostics  as  well  as 
robust  or  resistant  fitting  procedures. 

Much  of  the  work  an  robust  estimation  concerns  finding  estimators 
which  sacrifice  little  efficiency  at  the  assumed  model  while  providing 
protection  against  outliers  and  model  violations.  He  follow  this  course 
finding  hounded  influence  estimators  minimizing  certain  functionals  of  the 
asymptotic  covariance  matrix.  Related  work  includes  that  of  Hampel  (1978), 
Kraeker  (1980),  and  Kraaker  &  Helsch  (1982). 

Two  important  issues  when  fitting  models  to  data  are  (i)  identi¬ 
fication  of  outliers  and  influential  cases  and  (il)  accommodation  of  these 


2- 


obHrvicioni.  Frequently  when  influential  cases  are  present,  the  fitted 
model  is  not  representative  of  the  bulk  of  the  data.  To  rectify  this,  one 
can  slnply  delete  influential  cases  and  refit  via  standard  methods,  but 
this  approach  lacks  a  theory  for  Inference  and  testing;  the  effects  of  case 
deletion  upon  the  distributions  of  estimators  is  not  well  understood,  even 
asymptotically. 

The  robust  techniques  studied  here  provide  a  method  of  accommodating 
anomalous  data.  They  allow  continuous  downweighting  of  influential  cases 
and  are  anenahle  to  asymptotic  inference.  Also,  together  with  more  direct 
diagnostics,  residuals  and  weights  from  a  bounded  influence  fit  can  be  used 
to  detect  exceptional  observations. 

In  Section  2  we  present  same  general  theory:  this  is  specialised  to 
the  case  of  logistic  regression  in  Section  3;  proofs  of  theorems  are  given 
in  an  appendix. 


2.  THE  GEKEKAL  THBQKT 

I  2*1  The  regression  model 

We  study  regression  models  in  which  the  dependent  variable  Y  and 


explanatory  p-vector  X  have  a  density  of  the  form 

g(y,x;9  )  •  f(y;xT0  )s(x). 

Q  O 


(2.1) 


The  conditional  density  of  Y  given  X»x  is  f(y;x  9  )  and  depends  on  the 

Q 

T 

unknown  parameter  9q  only  through  x  0^;  s(x)  is  the  marginal  density  of  X. 

Expectation  with  respect  to  g(y,x;9)  is  denoted  by  E  while  E  indicates 

conditional  expectation  corresponding  to  f(y;x*0).  Model  (2.1)  includes 

many  generalized  linear  models  (McCullagh  &  Nelder,  1983,  Ch.  2). 

Suppose  (Y^,X^),  (i»l,...,n)  are  independent  copies  of  (Y,X) .  Under 

regularity  conditions  the  maximum  likelihood  estimator  of  0  satisfies 

o 


2  i(Y. ,x. ,e  >  -  o, 

i-i  1  1  *L 
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T  4  * 

where  l(y,x,9)  »  9/®9{log(f (y;x  «))},  end  n  "  ®Q)  converges  in  distrl- 

hution  to  a  p-dlmensional  normal  random  variate  with  mean  zero  and  co- 
variance  matrix  V(9  )  -  e’^JUY^X,*  HT(Y,X,«  )*  . 

0  9  O  Q 

O 


2.2  ff-estimatara  and  the  Jr  Influence  curve s 

e- 

We  will  consider  estimators  0^  satisfying 

I  *(Y  ,X  ,9  )  -  Q  , 
i-1  1  x  v 

P  P  P 

for  suitably  chosen  functions  1>  from  R  x  R  x  R  to  R  .  We  require  that  ♦ 


be  unbiased,  i.e.. 


Ee{*CY,X,fl)J  •  0  . 


(2.2) 


Under  regularity  conditions  (Huber,  1967),  0^  is  consistent  and  asymp¬ 
totically  normal  with  Influence  curve 


IC^(y ,x,G)  -  D^(a)*<y,x,6) 


(2.3) 


where 


0^(9)  -  -a/38lEeWY,x,a)He.8 


(2.4) 


Write  i»(8)  and  A(®)  far  4>(Y,X,9)  and  £(Y,X,9)  respectively.  Assuming 
that  integration  and  differentation  can  be  interchanged  in  (2.2)  and  (2.4) 


it  is  easy  to  show  that 


Now  let 


V*1  *  Eau<e)jL  (e>*‘ 


Vfl)  “  (®)}  * 


(2.5) 


(2.6) 


It  then  follows  (Huber,  1967)  that  the  asymptotic  variance  of  n*(9.  -  9  ) 

y  o 


v.(e  )  •  D*l(e  )w.(e  )(d"1(9  ))t. 

$  O  V  O  <>  Q  O 


For  robustness  we  want  IC^  to  be  bounded;  for  efficiency  we  want  to 
be  small.  In  the  next  section  we  define  a  norm  far  IC^  and  outline  a 


•  %  «iA 


? 


-4- 


theory  which  suggests  efficient  bounded  score  functions. 


a 

2*3  A  scalar  measure  of  influence  sad  sa  optimal  scars  function 
As  s  scalar  measure  of  maximum  influence  we  employ  a  definition  of 
sensitivity  introduced  by  Stahel  in  his  Swiss  Federal  Institute  of  Tech¬ 
nology  Ph.D.  thesis,  and  by  Kraaker  &  Velach  (1982).  The  self  standardized 

e* 

sensitivity  of  the  estimator  6^  is  defined  as 


s($)  ■  sup  sup 


(y ,x)  keo  aVu* 


.up  UcJv^IC  )* 

(y>*) 


•  sup  ( ^TW  1 .  (2.7) 

(y,x)  v 

For  a  generalized  linear  sK>del  s($)  has  a  natural  interpretation  In  terms 
of  the  link  function,  e.g.,  in  logistic  regression  m( d*)  measures  the 

X 

maximum  normalized  influence  of  (y,x)  on  an  estimated  logit  in  that  X  IC^ 

T~  T 

ia  the  influence  curve  for  X  9^  and  X  Y^X  is  the  asymptotic  variance 
T“ 

of  X  fl  .  Although  this  paper  studies  only  the  self  standardized 

/ 

sensitivity  we  believe  that  useful  estimators  can  also  he  obtained  by 
bounding  other  measures  of  Influence,  such  as  fitted  values. 

For  staximum  likelihood  f  «  JL  and,  in  general,  s(A)  »  +  »  .  To  ohtain 
robustness  we  limit  attention  to  only  those  estisiators  6^  for  which 

*<(0  Sb<».  (2.8) 


Such  an  estimator  is  said  to  have  bounded  influence  with  hound  b. 

Consider  the  score  function 

<JiBI(y,x,8)  -  (*-C)min*U,  b* /((i-C) Vl(*-C))>  ,  (2.9) 

where  Jt,»l(y,x,0)  and  C  ,  •  C(®)  and  B  *  B(8)  are  functions  of  G  defined 

p*l  pxp 

implicitly  by  the  equations 


% 

*k 

* 

*! 

* 


V*Bi<y'x’a)>  "  °*  8<t) '  vw* 
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(2.10) 


Vith  C(0)  and  B( 8)  so  defined*  9BT  ia  unbiased  and  W  (9)«*B(8),*o 

BI  *BI 

that  by  (2.7),  9jj  has  bounded  aenaitivity. 

The  vector  C( 0)  and  matrix  B(8)  are  analogous  to  robuat  multivariate 

location  and  acatter  functianala  for  JL(X,X,8),  (Maronna,  1976).  For 

aufflciently  large  b  solutions  C(8)  and-B(8)  satisfying  (2.10)  exist,  and 

aa  b  tenda  to  infinity  these  tend  to  zero  and  E{J.Jl  }  respectively.  Equation 

(2.9)  shows  that  9RT  la  similar  to  a  weighted  maximum  likelihood  scare  with 

weights  depending  on  the  distance  (f-C)^B  *(1-C);  aa  b  tends  to  infinity 

the  weighting  factor  tends  to  one  and  9^j  to  Jt. 

For  the  normal  theory  linear  model  9^j  is  the  score  function  found  by 

Rrasker  &  Velsch  (1982),  who  show  that  if  there  exists  a  score  9 

opt 

satisfying  (2.2)  and  (2.8)  which  minimizes  in  the  strong  sense  of 

positive  definiteness,  i.e.,  V  -  V  2  0  for  all  9,  then  it  must  be  of 

y  *opt 

the  form  (2.9).  That  9Bj.  possesses  similar  optimality  properties  is  seen 
in  Corollary  1.1  below. 


THBQMM  1.  If  for  a  given  choice  of  b  >  0  equations  (2.10)  possess  the 
solution  (C(8),  B(6)),  then  9„T  minimizes  MV.v) )  among  all  9 

HI  V  Bl 

satisfying  (2.2)  and 


•up  ucj  y-B\  IC  )  <  b*  . 
(y,x) 


(2.11) 


Mith  the  exception  of  multiplication  by  a  constant  matrix ,  9DT  is  unique 

dl 

almost  surely . 


Any  score  function  9  for  which  V  -V.  2  0  for  all  9  will  he 

°pt  ‘  *opt 

called  strongly  efficient;  we  now  state  the  following  corollary. 
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COfeQLlAIY  1.1.  If  there  exists  an  unbiased  *  strongly  officiant  score  $ 

opt 

satisfying  (2.8).  then  ♦  ^  is  equivalent  to  9^  whenever  the  latter  is 

defined . 


Remarks  1.  In  Theorem  1  the  conditions  for  optimality  of  depend  on 

-  JJX  3X 

itself  through  V*»  This  is  somewhat-  disconcerting.  Nevertheleaa  $ 

BI  HI 

does  satisfy  an  optimality  property  and  this  result  allows  us  to  prove 
Corollary  1.1. 

2.  Working  within  the  class  of  score  functions  of  the  form  £(y,x,8)m(y ,x,9) 

where  m  is  a  scalar  weight  function.  Krasker  &  Velsch  (1982)  find  the 
optimal  form  of  u.  Theorem  1  and  its  corollary  show  that  ia  optimal 

over  a  much  larger  claaa  of  functions  and  hence  yield  a  technically 
stronger  result  than  Krasker  and  Welsch's.  Also  our  proof  is  somewhat 
simpler  than  Krasker  and  Welach'a. 

3.  Ruppert  (1985)  has  shown  that  a  strongly  efficient  score  need  not 
exist,  in  which  case  Corollary  1.1  is  vacuous.  In  fact,  we  know  of  no  case 
with  p  2  2  where  a  strongly  efficient  score  has  been  shown  to  exist. 

However,  the  result  given  in  Corollary  1.1  is  still  of  interest;  Ruppert 
(1985)  uses  it  in  his  counter  example. 

4.  The  proofs  of  Theorem  1  and  its  corollary  are  presented  in  the  appendix. 


2*4  A  one-step  estimator 


Write  •  <(/^j(y ,x,0 ,B,C)  to  Indicate  dependence  on  B  and  C.  Theorem 


1  suggests  the  estimator  8_,T  obtained  by  solving 

III 


X  VV  ■ 0 

i«l 


*i<9)  -  ♦B1(Y1,x1,e,B(8),c(e)) 


where 

and  C(9)  and  B{0)  are  defined  implicitly  by  the  equations 


(2.12) 


I  E,v  {*.(«)>  -  0 

1-1  f*xt  1 

B<«)  -  n"1  [E  <♦.<•)♦?(•))  . 
i-1  a»xi  1  1 


(2.13) 


In  the  linear  model  (Krasker  &  Velsch,  1982)  symmetry  implies  C( 8)  a  Q 

A 

so  that  finding  8_ T  la  greatly  aittpllfied.  For  non-linear  models  aolving 

A 

for  8_  _  la  much  aore  difficult,  ao  ve  suggest  the  following  one-step 
fiX 

A* 

procedure.  Let  8  be  an  initial  root-n  consistent  eatiaator  of  8^. 
Compute  B(8)  and  C(6)  iteratively  from  (2.12)  and  (2.13).  Define 


where 


D(0) 


-ln  A  T 

n  E.  v  {*.<eU*(Y.,x.,e)J  . 
i«l  B,xi  1  11 


This  construction  ia  similar  to  Bickel’s  (1975)  Type  II  one-atep 

'‘(l) 

procedure.  Under  regularity  conditions  0^  ia  consistent  and  asymp¬ 
totically  normal  with  covariance  matrix  V__(8  )  •  D  *(8  )B(8  )(D  *(8  ))^, 
*  BI  a  BJ  o  o  BI  o 

*  *.1  w  _ 1  ^  T 

which  is  consistently  estimated  by  V  »  D  (8)B(8)(D  (8))  .  To  pre-erve 

; 

finite' sample  robustness  we  suggest  that  €  also  be  resistant  to  outliers. 


3.  APPLICATIQ8  TO  LOGISTIC  KBG1KSST0W 
J.  1  The  logistic  model 

Logistic  regression  ia  a  special  case  of  aodel  (2.1)  in  which  Y  is  an 
indicator  variable  such  that 

P(T-UX-x)  •  F(xT0q),  F(t)  -  l/( l+exp(-t)  )  . 

The  general  applicability  of  this  form  of  binary  regression  is  discussed  by 
Berksan  (1951),  Cox  (1970),  and  Efron  (1975).  The  likelihood  scare  is 
JL(y,x,0)  ■  (y-F(x*0))x  and  the  maximum  likelihood  estimator  is  consistent 
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and  an  asymptotically  normal  with  covariance  matrix  V(  9)  ■ 

El1  {F(1)(XTe  )XXTJ  where  F(1)(t)  -  (d/dt)F(t). 
a  o 

a 


3.2  Constructing  the  one-step  estimator  for  the  logistic  model - 

~(1) 

The  first  step  in  computing  9^^  entails  finding  an  easily  computed, 
robust,  raot-n  consistent  estimator  9.  He  find  an  optimal  score  function 
from  asKing  the  claas, 

Af  •  <*:*<y,x,9)  -  (y-F(xT9))w(x,9)} 
where  <o(*,*)  is  a  p~vector  valued  function  of  x  and  9  but  not  y .  The 
advantage,  in  terms  of  computational  simplicity,  of  restricting  attention 
to  score  functions  in  At  is  that  condition  (2-2)  is  automatically  satisfied 
and  it  is  not  necessary  to  estimate  a  robust  location  functional. 

The  estimator  we  propose,  and  call  a  bounded  leverage  estimator, 
corresponds  to  the  score 

(y  -  F(xT8))x  min^l  ,b2 /(m2  (xT8)xTQ  ^(elx)}, 
where  Q  *  Q(8)  ia  an  implicitly  defined  function  of  9  satisfying 

i  Q(9)  -  Efl<F(1)<XT9^min{l,b2/(m2(XTe)XTQ“1X)H,  (2.14) 

and  m( • )  ia  the  function  m(t)  »  max(F(t),  l-F(t)).  In  L.  A.  Stefanski'a 
University  of  North  Carolina  Ph.D.  thesis  it  is  shown  that  in  order  for 
(2.14)  to  possess  a  solution  Q  >  0,  it  is  necessary  that 

b2  >  p  /  EjF(1?XT9)/m2(XT0)} .  (2.15) 


Condition  (2.15)  is  generally  not  sufficient  however.  Note  that  vith  Q 


satisfying  (2.14),  W  and  hy  (2.7),  8_  has  bounded  influence. 

’fiL 


We  are  able  to  restrict  attention  to  only  those  in  At  and  still 

T 

obtain  bounded  influence  aimply  because  the  absolute  residual  )y  -F(x  9) | 

is  bounded.  However,  i_.  takes  a  pessimistic  view  in  downweighting 

m* 

observations  in  accordance  with  their  maximum  potential  influence  deter¬ 
mined  by  their  position  in  the  design  space  and  by  9.  The  term  leverage 
ia  often  used  to  denote  potential  influence  (Cook  &  Weisburg,  1983)  and 
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hence  the  name  bounded  leverage.  Potential  influence  is  often  far  greater 
than  the  actual  influence  vhen  the  observation  is  veil  fit  by  the  model. 

aw 

Although  downweighting  auch  points  results  in  a  loss  of  efficiency  for  6 

BL 

this  will  not  affect  the  efficiency  of  our  one-atep  estimator.  Also,  as 
the  following  results  show.  <i_T  is  the  moat  efficient  score  in  jY. 

DU 

THBOBBt  2.  If  foe  a  given  choice  of  b  >  0  equation  (2.1  A)  possesses  Che 
solution  Q  >  0  then  minimizes  tr(  )  among  all  in  M  satisfying 

.up  (icj  V^IC  >  <  b‘  . 

<y,x) 

Vith  the  exception  of  multiplication  by  a  constant  matrix ,  4>_ T  Is  unique 

HL 

almost  surely . 


OQBQLLAK7  2.1.  If  there  exists  a  strongly  efficient  score  4>  in  d,  then 

opt 

*  Is  equivalent  to  <J>_T  whenever  the  latter  Is  defined . 
opt  BL 


Remark  1 .  Proofs  are  similar  to  those  of  Theorem  1  and  its  corollary  and 
will  not  he  given. 

2 .  The  extent  to  which  Theorem  2  generalizes  to  other  regression  models 
is  limited,  since  it  requires  that  Jt(y,x,0)  he  a  hounded  function  of  y. 


For  the  one«atep  construction  in  Section  2.4  to  work  it  is  necessary  that 

8  be  root-n  consistent.  In  Stefanski's  Ph.D.  thesis  it  is  shown  that 

n^(8-8  )  is  asymptotically  normal  with  covariance  matrix  V_. (8  )  » 
o  fit  o 

D“?(0  )Q(8  )<DlJ(8  ))T  provided: 

BJU  O  O  HL  O 

(i)  h  it  sufficiently  large, 

(ii)  E{ | |X| |* }  <  •, 

n)  t  T  -1 

(iii)  E{FV  '(X  fl)XX  HXH  >  is  positive  definite, 

(iv)  (d/8Q)£{J(X»6,Q)}  is  nonsingular  where 

J(X,0,Q)  -  Q  -  F(1)CXT8)XXTminU,b*/(m*(XT8)XTQ'1X)>. 

The  key  assumptions  are  (iii)  and  (lv)  which  are  similar  to  Assumption  7  of 
Krasker  &  Helach  (1982). 

As  an  estimate  of  V  we  use  V  •  D  *(0)Q(0)(D  *(8))*  where 

jxL 


1  n  r 

5(0)  -  n"1  l  E~  x{8BL(Yi,X1,8,Q(8))lI(Y±,X1,0)}  . 

~  “(1) 

An  algorithm  for  computing  0  and  8^  for  logistic  regression  models 
appears  in  Stefanski's  Ph.D.  thesis.  To  fully  specify  the  algorithm  one 

I 

must  determine  the  bound  b.  For  0  this  was  chosen  as  a  constant  multiple 


of  b(0)  where 


b* (9)  -  pJ[xTl  l  (F<1)(xj0)/m,(xT9))l, 

*-  * 


see  (2.15).  For  the  examples  in  the  next  section  we  took  the  bound  to  be 

(1.5)b(0);  this  same  bound  was  then  used  for  the  one-step  estimator  8~^  . 

fix 

The  choice  (1.5)b(6)  was  suggested  hy  experience;  it  is  sufficiently  small 
to  provide  protection  from  extreme  observations  yet  large  enough  to  avoid 
computational  problems. 


3.  3  Example 


We  apply  our  results  to  dsts  relating  participation  In  the  U.S.  Food 
Staap  Program  to  various  socioeconomic  Indicators.  The  data,  which  are 
available  from  the  first  author,  were  selected  at  random  from  a  cohort  of 
aver  2QQQ  elderly  citizens.  The  cavariatea  are,  (1)  tenancy.  Indicating 
home  ownership;  (11)  supplemental  income,  indicating  whether  some  form  of 
supplemental  security  income  is  received;  and  (111)  monthly  income.  In  our 
sample  of  ISO  there  were  24  cases  of  participation. 

The  researcher  who  supplied  these  data  had  heen  using  prohlt  regres¬ 
sion  with  monthly  income  entering  linearly  in  the  model.  A  fit  of  the 
logistic  model  with  cavariatea  tenancy,  supplemental  income,  and  (monthly 
income) /IQ  produced  Table  1(a). 

a* 

Apart  from  the  constant  C,  a  is  a  weighted  maximum  likelihood  eati- 

oX 

mator  with  weights  •  min^l ,b* *0^-0)} ,  where  »  KY^X^e), 
see  equation  (2.9).  Estimated  weights,  leas  than  one  indicate  in¬ 
fluential  or  ill-fitting  observations.  For  the  analysis  in  Tahle  1(a)  6^  » 
0.69,  •  0.40,  «  0.98  and  ■  Q.62  were  the  only  weights  leas 

than  one.  Since  these  observations  correspond  to  the  four  largest  incomes 

among  those  receiving  food  stamps  a  transformation  of  Income  is  indicated. 

1 

In  Table  ^(b)  we  present  the  analysis  with  log(monthly  income  ♦  1) 
replacing  (monthly  income) /10.  This  transformation  substantially  reduces 
the  leverage  of  large  Income  values  but  increases  the  leverage  of  small 
Income  values.  For  this  model  the  hounded  Influence  estimator  downweighted 
only  two  observations  with  £5  ■  0.21  and  «  Q.76.  Case  66  has  the 
largest  income  among  those  participating  while  case  5  has  the  smallest 
income  among  those  not  participating.  Apparently  cases  #5  and  166  are 
influencing  the  maximum  likelihood  fit;  this  is  indicated  to  a  great  extent 
by  the  bounded  Influence  analysis  and  even  more  so  by  the  maximum  llkell- 
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hood  fit  with  the  two  outlying  cihi  tttovtd. 

An  advantage  of  robust  method*  over  maximum  likelihood  is  that 
residual  plots  are  mare  rellahle  for  uncovering  outliers.  This  Is  illus¬ 
trated  In  Figure  1.  Residuals  (Cox  (1970),  p.  96;  Preglbon  (1961))  are 
platted  for  hoth  the  aaxlaua  likelihood  and  bounded  Influence  fits. 

3*4  Conclusions* 

Our  bounded  influence  procedure  provides  a  method  of  fitting  meaning¬ 
ful  models  in  the  presence  of  anomalous  data.  Since  similar  aodela  can  be 
ohtained  by  diagnosing  and  deleting  outliers  It  is  worth  emphasizing  that, 
unlike  the  method  of  caae  deletion,  robust  methods  are  amenable  to 
asymptotic  Inference;  this  feature  is  important  whenever  hypothesis  testing 
or  confidence  regions  are  objectives. 

Robust  procedures  also  supply  useful  diagnostic  tools  for  model 
building.  Variable  selection,  as  well  aa  estimation,  can  be  Influenced  by 
anomalous  data;  Pregihon  (1982)  cites  such  an  example.  Often  robust  methods 
suggest /variables  appropriate  for  modeling  the  bulk  of  the  data  which  would 
otherwise  go  undetected  in  a  standard  maximum  likelihood  analysis.  Con¬ 
versely,  with  non-resistant  fitting,  a  variable  might  be  used  in  the  model 
simply  to  accommodate  a  single  outlier.  Zn  addition  to  variable  selection, 
the  weights  and  residuals  from  a  robust  fit  provide  useful  supplements  to 
store  direct  diagnostics.  For  example,  with  the  food  stamp  data,  an 
analyst,  seeing  the  impact  of  case  five,  might  question  the  validity  of 
that  observation  or  the  appropriateness  of  the  model  over  the  full  range  of 
incomes. 

The  research  of  the  first  two  authors  was  supported  by  the  Air  Force 
Office  of  Scientific  Research  while  that  of  the  third  author  was  supported 
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APPENDIX 

Proofs  of  Theom  1  tad  Coral  1  try  1 
Thaorea  1  ia  a  generalization  of  Appendix  A  in  Hampel  (1976)  and  the 
proof  given  here  uses  techniques  found  in  Rraaker  (I960). 

Proof  of  theorem  1.  Let  9  be  any  competitor  to  ♦  .  Without  loaa  of 

generality  assume  that  4  •  IC.  ,  i.e.  that  9  ia  in  canonical  (on  in  the 

aenae  of  Hampel  (1974).  Thia  ia  equivalent  to  aaauaing 

£ft{9(Y.X,e)jLT(Y,X.«)>  -  I  ,  (A.l) 

e  pxp 

and  lap  lie  a  V^e)  •  E#{4<Y,X,4)4T(Y,X,«))  . 

Now  write  f.  for  JL(Y,X,«)  and  9  for  4(Y,X,«).  If  4  satisfies  (A.l)  and 
(2.2)  then 

E«  -  »Hoji<l-c)  -  *>T|  . 

n^M-cXJ-c)1)^)1  -  D-fi;  -  (d;J)t  .  ye). 

Therefore  tr(V^Vg^)  is.,  neglecting  an  additive  constant  independent  of 
4»  proportional  to 

V<DBia"C)  '  ♦)TvBI<DBia'C)  ”*>J*  (A.2) 

Define  4  •  in  term*  of  4,  (A. 2)  becomes 

Vl^VBinil<*"C),,*‘  <A.3) 

Note  that  |4|*  •  4TVB*4  and  thua  subject  to  (2.11),  equation  (A. 3)  ia 

minimized,  as  a  function  of  4,  hy 

4  -  Y'^a-C)min^l,b*/((1-C)T^Y“J(I^J)T(J1-C))}.(A.4) 
Condition  (A.l)  insures  that  4  is  unique  almost  surely.  Equations  (2.3), 
(2.5) ,  (2.6),  and  (2.10)  imply  •  B  *  thus  in  terma  of  4, 

(A. 4)  becomea  4  *  proving  the  theorem.  // 


Proof  of  Corollary  1.1*  Again  aaauaa  that  all  scoroa  ara  in  canonical  fora 
and  satisfy  (2.2).  Define 

5-  <♦:  aup  ♦VU  Sb*},  sup  gTv'i[*  i.  b*)  . 

( y,x )  <y.x> 

We  ouat  show  that  if  there  exiata  i  in  S  auch  that  V.  S  V .  for  all  ♦ 

r  -i  v  * 

in  S»  then  ♦  t  ia  equivalent  to  Clearly  i*  in  S, 

thua  by  aaauoption  V  S  V__ .  from  thia  it  follows  that 

*opt 

♦aptV«V  4  *lrt\[pt\Pt s  **  • 

and  hence  ♦  t  ia  in  d^,.  Let  I  ■  S  ft  5^.  The  aet  I  ia  nonenpty;  it 
•1 

contains  D_T«h__  and  ♦  _ .  for  any  (i  in  /  we  know  V .  S  V .  and  hence 

Ui.  xsi  Opt  f  ^  V 

-1  -1  "P* 

tr,v*  >> s  "'VbP 

for  all  ♦  in  I.  But  Theareo  1  prove  a  that  *  when  defined,  ia  the 

aloaat  everywhere  unique  niniaiter  of  trOf^V^j)  a  ang  all  <i  in  A  The 


equivalence  of  ♦Qpt  and  follows 


// 


•Li- 
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Table  1  (a).  Estimates  for  the  logistic  regression  model  with  covariates 
tenancy,  supplemental  income,  and  (monthly  income)/10;  p-values  in 
parentheses. 


intercept 

tenancy 

supplemental 

income 

(monthly  income)/10 

®ML 

-0.34 

(Q.5287) 

-1.76 

(0.0009) 

Q.78 

(Q.1259) 

-0.01 

(0.1122) 

a 

-0.16 

(0.7872) 

-1.75 

(Q.Q014) 

0.77 

(0.1360) 

-Q.Q2 

(0.0826) 

;(i) 

BI 

-0.20 

(0.6006) 

-1.76 

(0.0012) 

0.78 

(Q.13Q0) 

-0.02 

(0.0922) 

Table  1  (b).  Estimates  for  the  logistic  regression  model 
tenancy,  supplemental  income,  and  log (monthly  income  4-1) 
parentheses. 

vith  covariates 
;  p-values  in 

i  intercept 

tenancy 

supplemental 

income 

log( monthly 
income  +1) 

A. 

eHL 

0.93 

(0.5681) 

-1.85 

(Q.00Q5) 

Q.9Q 

(Q.0737) 

-0.33 

(Q.2228) 

As 

a 

4.14 

(0.103Q) 

-1.81 

(0.0007) 

0.75 

(0.1444) 

-0.86 

(0.0430) 

;(i) 

BI 

4.02 

(0.1100) 

-1.81 

(0.0006) 

Q.76 

(Q.1416) 

-0.84 

(0.0465) 

*  * 
a 

ML 

6.88 

(0..Q16Q) 

-2.Q2 

(0.0004) 

0.76 

(Q.1586) 

-1.33 

(0.0062) 

*  With  cases  #5  and  #66  excluded 


residual 


o 


o 


observation  number 


Figure  1 «  Residual  Plot*  for  FS  Data.  Maximum  likelihood  reaiduala  are 
Indicated  by  circles  'o';  reaiduala  from  the  bounded  influence  fit  by 
aaterlaka  Far  both  eatlaatlan  procedures  reaiduala  are  defined  aa  in 

Cox  <1970),  p.  96.  Negligible  reaiduala  have  been  omitted  for  clarity. 


o  • 


